XLNet is a powerful language model that outperformed BERT on 20+ NLP tasks.
The original implementation requires tensorflow 1.13.1, which is not pre-installed in colab
Running import tensorflow will import the default version (currently 2.x). You can use 1.x by running a cell with the tensorflow_version magic before you run import tensorflow.
%tensorflow_version 1.x
Once you have specified a version via this magic, you can run import tensorflow as normal and verify which version was imported as follows:
import tensorflow
print(tensorflow.__version__)
1.15.2
However, version 1.13.1 is not available
%tensorflow_version 1.13.1
`%tensorflow_version` only switches the major version: 1.x or 2.x. You set: `1.13.1`. This will be interpreted as: `1.x`. TensorFlow 1.x selected.
import tensorflow
print(tensorflow.__version__)
1.15.2
In this case, we can install a specific tensorflow in an conda environment.
For example, if I want to install tensorflow 1.13.1 under python 2.7, I can run the following commands in the terminal (not in the notebook cells)
cd /root
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
sh Miniconda3-latest-Linux-x86_64.sh
source .bashrc
conda create -n tf113py2 python=2.7
conda activate tf113py2
pip install sentencepiece==0.1.5
conda install tensorflow-gpu=1.13.1
You can back up the conda folder to google drive or anywhere else.
tar cvf miniconda3.tar miniconda3/
if you want to back up to google drive, mount the google drive:
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
! mkdir /content/drive/MyDrive/colab_data/conda_backup
mkdir: cannot create directory ‘/content/drive/MyDrive/colab_data/conda_backup’: File exists
! cp miniconda3.tar.gz /content/drive/MyDrive/colab_data/conda_backup
! cp .bashrc /content/drive/MyDrive/colab_data/conda_backup/colab_bashrc
miniconda3.tar.gz
! ls /content/drive/MyDrive/colab_data/conda_backup/
colab_bashrc miniconda3.tar.gz
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
ls /content/drive/MyDrive/colab_data/conda_backup/
colab_bashrc miniconda3.tar.gz
import os
os.chdir('/root')
import sys
print(sys.version)
3.7.10 (default, Feb 20 2021, 21:17:23) [GCC 7.5.0]
! cp /content/drive/MyDrive/colab_data/conda_backup/miniconda3.tar.gz /root/
! tar xaf miniconda3.tar.gz
! cp /content/drive/MyDrive/colab_data/conda_backup/colab_bashrc /root/
conda activate tf113py2
Please note that the tensorflow vesion in the terminal is differnt from that in notebook
import tensorflow as tf
print("Tensorflow version " + tf.__version__)
try:
tpu = tf.distribute.cluster_resolver.TPUClusterResolver() # TPU detection
print('Running on TPU ', tpu.cluster_spec().as_dict()['worker'])
except ValueError:
raise BaseException('ERROR: Not connected to a TPU runtime; please see the previous cell in this notebook for instructions!')
tf.config.experimental_connect_to_cluster(tpu)
tf.tpu.experimental.initialize_tpu_system(tpu)
tpu_strategy = tf.distribute.experimental.TPUStrategy(tpu)
Tensorflow version 2.4.1 Running on TPU ['10.97.191.18:8470'] INFO:tensorflow:Initializing the TPU system: grpc://10.97.191.18:8470
INFO:tensorflow:Initializing the TPU system: grpc://10.97.191.18:8470
INFO:tensorflow:Clearing out eager caches
INFO:tensorflow:Clearing out eager caches
INFO:tensorflow:Finished initializing TPU system.
INFO:tensorflow:Finished initializing TPU system. WARNING:absl:`tf.distribute.experimental.TPUStrategy` is deprecated, please use the non experimental symbol `tf.distribute.TPUStrategy` instead.
INFO:tensorflow:Found TPU system:
INFO:tensorflow:Found TPU system:
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)
Google cloud storage is required for running XLNet using TPU. The output files will be written to the google cloud storage bucket.
from google.colab import auth
auth.authenticate_user()
! gcloud init
Welcome! This command will take you through the configuration of gcloud. Settings from your current configuration [default] are: component_manager: disable_update_check: 'True' compute: gce_metadata_read_timeout_sec: '0' core: account: fangli2718@gmail.com Pick configuration to use: [1] Re-initialize this configuration [default] with new settings [2] Create a new configuration Please enter your numeric choice: 2 Enter configuration name. Names start with a lower case letter and contain only lower case letters a-z, digits 0-9, and hyphens '-': fangli3 Your current configuration has been set to: [fangli3] You can skip diagnostics next time by using the following flag: gcloud init --skip-diagnostics Network diagnostic detects and fixes local network connection issues. Reachability Check passed. Network diagnostic passed (1/1 checks passed). Choose the account you would like to use to perform operations for this configuration: [1] fangli2718@gmail.com [2] Log in with a new account Please enter your numeric choice: 1 You are logged in as: [fangli2718@gmail.com]. Pick cloud project to use: [1] xlnet3 [2] Create a new project Please enter numeric choice or text value (must exactly match list item): 1 Your current project has been set to: [xlnet3]. Do you want to configure a default Compute Region and Zone? (Y/n)? y Which Google Compute Engine zone would you like to use as project default? If you do not specify a zone via a command line flag while working with Compute Engine resources, the default is assumed. [1] us-east1-b [2] us-east1-c [3] us-east1-d [4] us-east4-c [5] us-east4-b [6] us-east4-a [7] us-central1-c [8] us-central1-a [9] us-central1-f [10] us-central1-b [11] us-west1-b [12] us-west1-c [13] us-west1-a [14] europe-west4-a [15] europe-west4-b [16] europe-west4-c [17] europe-west1-b [18] europe-west1-d [19] europe-west1-c [20] europe-west3-c [21] europe-west3-a [22] europe-west3-b [23] europe-west2-c [24] europe-west2-b [25] europe-west2-a [26] asia-east1-b [27] asia-east1-a [28] asia-east1-c [29] asia-southeast1-b [30] asia-southeast1-a [31] asia-southeast1-c [32] asia-northeast1-b [33] asia-northeast1-c [34] asia-northeast1-a [35] asia-south1-c [36] asia-south1-b [37] asia-south1-a [38] australia-southeast1-b [39] australia-southeast1-c [40] australia-southeast1-a [41] southamerica-east1-b [42] southamerica-east1-c [43] southamerica-east1-a [44] asia-east2-a [45] asia-east2-b [46] asia-east2-c [47] asia-northeast2-a [48] asia-northeast2-b [49] asia-northeast2-c [50] asia-northeast3-a Did not print [24] options. Too many options [74]. Enter "list" at prompt to print choices fully. Please enter numeric choice or text value (must exactly match list item): 9 Your project default Compute Engine zone has been set to [us-central1-f]. You can change it by running [gcloud config set compute/zone NAME]. Your project default Compute Engine region has been set to [us-central1]. You can change it by running [gcloud config set compute/region NAME]. Your Google Cloud SDK is configured and ready to use! * Commands that require authentication will use fangli2718@gmail.com by default * Commands will reference project `xlnet3` by default * Compute Engine commands will use region `us-central1` by default * Compute Engine commands will use zone `us-central1-f` by default Run `gcloud help config` to learn how to change individual settings This gcloud configuration is called [fangli3]. You can create additional configurations if you work with multiple accounts and/or projects. Run `gcloud topic configurations` to learn more. Some things to try next: * Run `gcloud --help` to see the Cloud Platform services you can interact with. And run `gcloud help COMMAND` to get help on any gcloud command. * Run `gcloud topic --help` to learn about advanced features of the SDK like arg files and output formatting
check if the bucked is connected:
! gsutil ls gs://fangli3/bioxlnet/xlnet_cased_L-24_H-1024_A-16/
gs://fangli3/bioxlnet/xlnet_cased_L-24_H-1024_A-16/spiece.model gs://fangli3/bioxlnet/xlnet_cased_L-24_H-1024_A-16/xlnet_config.json gs://fangli3/bioxlnet/xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt.data-00000-of-00001 gs://fangli3/bioxlnet/xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt.index gs://fangli3/bioxlnet/xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt.meta
copy the key file to home
! gsutil cp gs://fangli3/bioxlnet/gs_key_fangli3dl4.json /root/
Copying gs://fangli3/bioxlnet/gs_key_fangli3dl4.json... / [1 files][ 2.2 KiB/ 2.2 KiB] Operation completed over 1 objects/2.2 KiB.
check if tensorflow can access the bucket
import tensorflow as tf
tf.io.gfile.exists('gs://fangli3/bioxlnet/v7_base_tf/step96k/model.ckpt.index')
True
set up environment variable in terminal (not in notebook cell)
export GOOGLE_APPLICATION_CREDENTIALS=/root/gs_key_fangli3dl4.json
check if tensorflow 1.13.1 can access the bucket
cmd = "import tensorflow as tf\n"
cmd += "print(tf.gfile.Exists('gs://fangli3/'))\n"
f = open('check_gcs.py', 'w')
f.write(cmd + '\n')
f.close()
run this command in terminal: python check_gcs.py
It should print True
I have a modified version which fixed some bugs
os.chdir('/root')
! wget http://144.34.239.101/c336d0014288370fa581fee196bbec3e/xlnet-modified.tar.gz && tar xaf xlnet-modified.tar.gz
--2021-03-03 18:13:07-- http://144.34.239.101/c336d0014288370fa581fee196bbec3e/xlnet-modified.tar.gz Connecting to 144.34.239.101:80... connected. HTTP request sent, awaiting response... 200 OK Length: 6192525 (5.9M) [application/x-gzip] Saving to: ‘xlnet-modified.tar.gz’ xlnet-modified.tar. 100%[===================>] 5.91M 14.0MB/s in 0.4s 2021-03-03 18:13:08 (14.0 MB/s) - ‘xlnet-modified.tar.gz’ saved [6192525/6192525]
cmd = '''/root/miniconda3/envs/tf113py2/bin/python /root/xlnet/train.py \
--init_checkpoint=gs://fangli3/bioxlnet/xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt \
--alsologtostderr \
--log_dir=/root/bioxlnet/ \
--num_passes=1 \
--train_steps=1000000 \
--learning_rate=3.33e-6 \
--save_steps=5000 \
--iterations=1000 \
--nouncased \
--num_core_per_host=8 \
--record_info_dir=gs://fangli3/bioxlnet/data/256_tokens_ascii_1M_lines_part002/bsz_per_host16.num_core_per_host8.seq_len256.reuse_len128/tfrecords/ \
--train_batch_size=16 \
--seq_len=256 \
--reuse_len=128 \
--mem_len=192 \
--perm_size=128 \
--n_layer=24 \
--d_model=1024 \
--d_embed=1024 \
--n_head=16 \
--d_head=64 \
--d_inner=4096 \
--model_dir=gs://fangli3/bioxlnet/v6_large_tf/model/ \
--untie_r=True \
--mask_alpha=6 \
--mask_beta=1 \
--num_predict=85
'''
sh_file = 'run_train_large.sh'
sh_f = open(sh_file, 'w')
sh_f.write(cmd)
sh_f.close()
! cat run_train_large.sh
/root/miniconda3/envs/tf113py2/bin/python /root/xlnet/train.py --init_checkpoint=gs://fangli3/bioxlnet/xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt --alsologtostderr --log_dir=/root/bioxlnet/ --num_passes=1 --train_steps=1000000 --learning_rate=3.33e-6 --save_steps=5000 --iterations=1000 --nouncased --num_core_per_host=8 --record_info_dir=gs://fangli3/bioxlnet/data/256_tokens_ascii_1M_lines_part002/bsz_per_host16.num_core_per_host8.seq_len256.reuse_len128/tfrecords/ --train_batch_size=16 --seq_len=256 --reuse_len=128 --mem_len=192 --perm_size=128 --n_layer=24 --d_model=1024 --d_embed=1024 --n_head=16 --d_head=64 --d_inner=4096 --model_dir=gs://fangli3/bioxlnet/v6_large_tf/model/ --untie_r=True --mask_alpha=6 --mask_beta=1 --num_predict=85
run this command in terminal, not in the notebook:
sh run_train_large.sh &> run_train_large.sh.log
! tail -f run_train_large.sh.log
INFO:tensorflow: name = model/transformer/layer_23/ff/LayerNorm/gamma/Adam_1:0, shape = (1024,) I0303 18:27:26.640438 139698700453760 model_utils.py:91] name = model/transformer/layer_23/ff/LayerNorm/gamma/Adam_1:0, shape = (1024,) INFO:tensorflow: name = model/lm_loss/bias/Adam:0, shape = (32000,) I0303 18:27:26.640531 139698700453760 model_utils.py:91] name = model/lm_loss/bias/Adam:0, shape = (32000,) INFO:tensorflow: name = model/lm_loss/bias/Adam_1:0, shape = (32000,) I0303 18:27:26.640599 139698700453760 model_utils.py:91] name = model/lm_loss/bias/Adam_1:0, shape = (32000,) INFO:tensorflow:Create CheckpointSaverHook. I0303 18:27:29.580889 139698700453760 basic_session_run_hooks.py:527] Create CheckpointSaverHook. INFO:tensorflow:Done calling model_fn. I0303 18:27:30.186203 139698700453760 estimator.py:1113] Done calling model_fn. INFO:tensorflow:TPU job name tpu_worker I0303 18:27:36.146019 139698700453760 tpu_estimator.py:447] TPU job name tpu_worker INFO:tensorflow:Graph was finalized. I0303 18:27:38.824338 139698700453760 monitored_session.py:222] Graph was finalized. WARNING:tensorflow:From /root/miniconda3/envs/tf113py2/lib/python2.7/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. W0303 18:27:38.915483 139698700453760 deprecation.py:323] From /root/miniconda3/envs/tf113py2/lib/python2.7/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. INFO:tensorflow:Restoring parameters from gs://fangli3/bioxlnet/v6_large_tf/model/model.ckpt-570000 I0303 18:27:39.021770 139698700453760 saver.py:1270] Restoring parameters from gs://fangli3/bioxlnet/v6_large_tf/model/model.ckpt-570000 ^C
transformers library (non-official version)¶This library is developped by huggingface (https://huggingface.co/)
! pip install transformers
! pip install datasets
! pip install seqeval
Collecting transformers
Downloading https://files.pythonhosted.org/packages/f9/54/5ca07ec9569d2f232f3166de5457b63943882f7950ddfcc887732fc7fb23/transformers-4.3.3-py3-none-any.whl (1.9MB)
|████████████████████████████████| 1.9MB 6.4MB/s
Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from transformers) (2.23.0)
Collecting sacremoses
Downloading https://files.pythonhosted.org/packages/7d/34/09d19aff26edcc8eb2a01bed8e98f13a1537005d31e95233fd48216eed10/sacremoses-0.0.43.tar.gz (883kB)
|████████████████████████████████| 890kB 23.8MB/s
Requirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from transformers) (3.0.12)
Requirement already satisfied: packaging in /usr/local/lib/python3.7/dist-packages (from transformers) (20.9)
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.7/dist-packages (from transformers) (1.19.5)
Requirement already satisfied: importlib-metadata; python_version < "3.8" in /usr/local/lib/python3.7/dist-packages (from transformers) (3.7.0)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.7/dist-packages (from transformers) (2019.12.20)
Collecting tokenizers<0.11,>=0.10.1
Downloading https://files.pythonhosted.org/packages/71/23/2ddc317b2121117bf34dd00f5b0de194158f2a44ee2bf5e47c7166878a97/tokenizers-0.10.1-cp37-cp37m-manylinux2010_x86_64.whl (3.2MB)
|████████████████████████████████| 3.2MB 32.3MB/s
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.7/dist-packages (from transformers) (4.41.1)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (2020.12.5)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (2.10)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (1.24.3)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (3.0.4)
Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers) (1.15.0)
Requirement already satisfied: click in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers) (7.1.2)
Requirement already satisfied: joblib in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers) (1.0.1)
Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging->transformers) (2.4.7)
Requirement already satisfied: typing-extensions>=3.6.4; python_version < "3.8" in /usr/local/lib/python3.7/dist-packages (from importlib-metadata; python_version < "3.8"->transformers) (3.7.4.3)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata; python_version < "3.8"->transformers) (3.4.0)
Building wheels for collected packages: sacremoses
Building wheel for sacremoses (setup.py) ... done
Created wheel for sacremoses: filename=sacremoses-0.0.43-cp37-none-any.whl size=893262 sha256=8b26b029ab018abab9c1d851e6e65677be60a25825fc7c26ffce4d8f99d2479d
Stored in directory: /root/.cache/pip/wheels/29/3c/fd/7ce5c3f0666dab31a50123635e6fb5e19ceb42ce38d4e58f45
Successfully built sacremoses
Installing collected packages: sacremoses, tokenizers, transformers
Successfully installed sacremoses-0.0.43 tokenizers-0.10.1 transformers-4.3.3
Collecting datasets
Downloading https://files.pythonhosted.org/packages/91/8e/68011343a74dfb7bb2e59ea10b3191c7d55b43c8239356875609a56d7c71/datasets-1.4.0-py3-none-any.whl (186kB)
|████████████████████████████████| 194kB 5.1MB/s
Collecting huggingface-hub==0.0.2
Downloading https://files.pythonhosted.org/packages/b5/93/7cb0755c62c36cdadc70c79a95681df685b52cbaf76c724facb6ecac3272/huggingface_hub-0.0.2-py3-none-any.whl
Requirement already satisfied: importlib-metadata; python_version < "3.8" in /usr/local/lib/python3.7/dist-packages (from datasets) (3.7.0)
Requirement already satisfied: multiprocess in /usr/local/lib/python3.7/dist-packages (from datasets) (0.70.11.1)
Requirement already satisfied: pandas in /usr/local/lib/python3.7/dist-packages (from datasets) (1.1.5)
Collecting fsspec
Downloading https://files.pythonhosted.org/packages/91/0d/a6bfee0ddf47b254286b9bd574e6f50978c69897647ae15b14230711806e/fsspec-0.8.7-py3-none-any.whl (103kB)
|████████████████████████████████| 112kB 9.2MB/s
Requirement already satisfied: pyarrow>=0.17.1 in /usr/local/lib/python3.7/dist-packages (from datasets) (3.0.0)
Requirement already satisfied: dill in /usr/local/lib/python3.7/dist-packages (from datasets) (0.3.3)
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.7/dist-packages (from datasets) (1.19.5)
Requirement already satisfied: requests>=2.19.0 in /usr/local/lib/python3.7/dist-packages (from datasets) (2.23.0)
Collecting xxhash
Downloading https://files.pythonhosted.org/packages/e7/27/1c0b37c53a7852f1c190ba5039404d27b3ae96a55f48203a74259f8213c9/xxhash-2.0.0-cp37-cp37m-manylinux2010_x86_64.whl (243kB)
|████████████████████████████████| 245kB 9.3MB/s
Requirement already satisfied: tqdm<4.50.0,>=4.27 in /usr/local/lib/python3.7/dist-packages (from datasets) (4.41.1)
Requirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from huggingface-hub==0.0.2->datasets) (3.0.12)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata; python_version < "3.8"->datasets) (3.4.0)
Requirement already satisfied: typing-extensions>=3.6.4; python_version < "3.8" in /usr/local/lib/python3.7/dist-packages (from importlib-metadata; python_version < "3.8"->datasets) (3.7.4.3)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas->datasets) (2018.9)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas->datasets) (2.8.1)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->datasets) (2.10)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->datasets) (3.0.4)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->datasets) (1.24.3)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->datasets) (2020.12.5)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.7.3->pandas->datasets) (1.15.0)
Installing collected packages: huggingface-hub, fsspec, xxhash, datasets
Successfully installed datasets-1.4.0 fsspec-0.8.7 huggingface-hub-0.0.2 xxhash-2.0.0
Collecting seqeval
Downloading https://files.pythonhosted.org/packages/9d/2d/233c79d5b4e5ab1dbf111242299153f3caddddbb691219f363ad55ce783d/seqeval-1.2.2.tar.gz (43kB)
|████████████████████████████████| 51kB 342kB/s
Requirement already satisfied: numpy>=1.14.0 in /usr/local/lib/python3.7/dist-packages (from seqeval) (1.19.5)
Requirement already satisfied: scikit-learn>=0.21.3 in /usr/local/lib/python3.7/dist-packages (from seqeval) (0.22.2.post1)
Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.7/dist-packages (from scikit-learn>=0.21.3->seqeval) (1.0.1)
Requirement already satisfied: scipy>=0.17.0 in /usr/local/lib/python3.7/dist-packages (from scikit-learn>=0.21.3->seqeval) (1.4.1)
Building wheels for collected packages: seqeval
Building wheel for seqeval (setup.py) ... done
Created wheel for seqeval: filename=seqeval-1.2.2-cp37-none-any.whl size=16172 sha256=7f6f6c22e15ff18650d3e171e3978bb2fe587ca7610409dad470daa44e275709
Stored in directory: /root/.cache/pip/wheels/52/df/1b/45d75646c37428f7e626214704a0e35bd3cfc32eda37e59e5f
Successfully built seqeval
Installing collected packages: seqeval
Successfully installed seqeval-1.2.2
XLA: Accelerated Linear Algebra
! pip install cloud-tpu-client==0.10 https://storage.googleapis.com/tpu-pytorch/wheels/torch_xla-1.7-cp37-cp37m-linux_x86_64.whl
Collecting cloud-tpu-client==0.10
Downloading https://files.pythonhosted.org/packages/56/9f/7b1958c2886db06feb5de5b2c191096f9e619914b6c31fdf93999fdbbd8b/cloud_tpu_client-0.10-py3-none-any.whl
Collecting torch-xla==1.7
Downloading https://storage.googleapis.com/tpu-pytorch/wheels/torch_xla-1.7-cp37-cp37m-linux_x86_64.whl (133.6MB)
|████████████████████████████████| 133.6MB 73kB/s
Collecting google-api-python-client==1.8.0
Downloading https://files.pythonhosted.org/packages/9a/b4/a955f393b838bc47cbb6ae4643b9d0f90333d3b4db4dc1e819f36aad18cc/google_api_python_client-1.8.0-py3-none-any.whl (57kB)
|████████████████████████████████| 61kB 2.1MB/s
Requirement already satisfied: oauth2client in /usr/local/lib/python3.7/dist-packages (from cloud-tpu-client==0.10) (4.1.3)
Requirement already satisfied: google-auth-httplib2>=0.0.3 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client==1.8.0->cloud-tpu-client==0.10) (0.0.4)
Requirement already satisfied: six<2dev,>=1.6.1 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client==1.8.0->cloud-tpu-client==0.10) (1.15.0)
Requirement already satisfied: google-api-core<2dev,>=1.13.0 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client==1.8.0->cloud-tpu-client==0.10) (1.16.0)
Requirement already satisfied: httplib2<1dev,>=0.9.2 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client==1.8.0->cloud-tpu-client==0.10) (0.17.4)
Requirement already satisfied: uritemplate<4dev,>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client==1.8.0->cloud-tpu-client==0.10) (3.0.1)
Requirement already satisfied: google-auth>=1.4.1 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client==1.8.0->cloud-tpu-client==0.10) (1.27.0)
Requirement already satisfied: rsa>=3.1.4 in /usr/local/lib/python3.7/dist-packages (from oauth2client->cloud-tpu-client==0.10) (4.7.2)
Requirement already satisfied: pyasn1-modules>=0.0.5 in /usr/local/lib/python3.7/dist-packages (from oauth2client->cloud-tpu-client==0.10) (0.2.8)
Requirement already satisfied: pyasn1>=0.1.7 in /usr/local/lib/python3.7/dist-packages (from oauth2client->cloud-tpu-client==0.10) (0.4.8)
Requirement already satisfied: pytz in /usr/local/lib/python3.7/dist-packages (from google-api-core<2dev,>=1.13.0->google-api-python-client==1.8.0->cloud-tpu-client==0.10) (2018.9)
Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core<2dev,>=1.13.0->google-api-python-client==1.8.0->cloud-tpu-client==0.10) (1.52.0)
Requirement already satisfied: setuptools>=34.0.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core<2dev,>=1.13.0->google-api-python-client==1.8.0->cloud-tpu-client==0.10) (54.0.0)
Requirement already satisfied: protobuf>=3.4.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core<2dev,>=1.13.0->google-api-python-client==1.8.0->cloud-tpu-client==0.10) (3.12.4)
Requirement already satisfied: requests<3.0.0dev,>=2.18.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core<2dev,>=1.13.0->google-api-python-client==1.8.0->cloud-tpu-client==0.10) (2.23.0)
Requirement already satisfied: cachetools<5.0,>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from google-auth>=1.4.1->google-api-python-client==1.8.0->cloud-tpu-client==0.10) (4.2.1)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0dev,>=2.18.0->google-api-core<2dev,>=1.13.0->google-api-python-client==1.8.0->cloud-tpu-client==0.10) (2020.12.5)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0dev,>=2.18.0->google-api-core<2dev,>=1.13.0->google-api-python-client==1.8.0->cloud-tpu-client==0.10) (3.0.4)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0dev,>=2.18.0->google-api-core<2dev,>=1.13.0->google-api-python-client==1.8.0->cloud-tpu-client==0.10) (1.24.3)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0dev,>=2.18.0->google-api-core<2dev,>=1.13.0->google-api-python-client==1.8.0->cloud-tpu-client==0.10) (2.10)
Installing collected packages: google-api-python-client, cloud-tpu-client, torch-xla
Found existing installation: google-api-python-client 1.7.12
Uninstalling google-api-python-client-1.7.12:
Successfully uninstalled google-api-python-client-1.7.12
Successfully installed cloud-tpu-client-0.10 google-api-python-client-1.8.0 torch-xla-1.7
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
import os
os.chdir('/content/drive/MyDrive/colab_data/pretrained_models/bioXLNet')
train_file = '/content/drive/MyDrive/colab_data/pretrained_models/bioXLNet/data/tiny.txt'
test_file = train_file
xla_spawn = '/content/drive/MyDrive/colab_data/pretrained_models/bioXLNet/transformers-4.2.2/examples/xla_spawn.py'
cmd = '''
TRAIN_FILE=/content/drive/MyDrive/colab_data/pretrained_models/bioXLNet/data/tiny.txt
TEST_FILE=/content/drive/MyDrive/colab_data/pretrained_models/bioXLNet/data/tiny.txt
N_CPU=1
XLA_SPAWN=/content/drive/MyDrive/colab_data/pretrained_models/bioXLNet/transformers-4.2.2/examples/xla_spawn.py
RUN_PLM=/content/drive/MyDrive/colab_data/pretrained_models/bioXLNet/transformers-4.2.2/examples/language-modeling/run_plm.py
WORK_DIR=/content/drive/MyDrive/colab_data/pretrained_models/bioXLNet/v1_base/
MODEL_DIR=${WORK_DIR}/huggingface_model/
NUM_EPOCHS=10
MODEL_NAME_OR_PATH=xlnet-base-cased
MAX_LENGTH=512
GRAD_ACCU_STEPS=2
PER_DEVICE_BATCH_SIZE=2
SEED=11
LOG_STEP=1
SAVE_STEP=100
LR=5e-5
WARM_UP_STEP=10
RUN_NAME=v1_base_huggingface
mkdir -p ${MODEL_DIR}
/usr/local/bin/python ${XLA_SPAWN} \
--num_cores 8 \
${RUN_PLM} \
--preprocessing_num_workers ${N_CPU} \
--run_name ${RUN_NAME} \
--dataloader_num_workers ${N_CPU} \
--model_name_or_path ${MODEL_NAME_OR_PATH} \
--max_seq_length ${MAX_LENGTH} \
--per_device_train_batch_size ${PER_DEVICE_BATCH_SIZE} \
--per_device_eval_batch_size ${PER_DEVICE_BATCH_SIZE} \
--gradient_accumulation_steps ${GRAD_ACCU_STEPS} \
--num_train_epochs ${NUM_EPOCHS} \
--train_file ${TRAIN_FILE} \
--validation_file ${TEST_FILE} \
--do_train \
--do_eval \
--output_dir ${MODEL_DIR} \
--overwrite_output_dir \
--save_steps ${SAVE_STEP} \
--seed ${SEED} \
--logging_first_step \
--logging_steps $LOG_STEP \
--learning_rate ${LR} \
--warmup_steps ${WARM_UP_STEP} \
--pad_to_max_length
'''
out_f = open('run_xla_train.sh', 'w')
out_f.write(cmd)
out_f.close()
! cat run_xla_train.sh
TRAIN_FILE=/content/drive/MyDrive/colab_data/pretrained_models/bioXLNet/data/tiny.txt
TEST_FILE=/content/drive/MyDrive/colab_data/pretrained_models/bioXLNet/data/tiny.txt
N_CPU=1
XLA_SPAWN=/content/drive/MyDrive/colab_data/pretrained_models/bioXLNet/transformers-4.2.2/examples/xla_spawn.py
RUN_PLM=/content/drive/MyDrive/colab_data/pretrained_models/bioXLNet/transformers-4.2.2/examples/language-modeling/run_plm.py
WORK_DIR=/content/drive/MyDrive/colab_data/pretrained_models/bioXLNet/v1_base/
MODEL_DIR=${WORK_DIR}/huggingface_model/
NUM_EPOCHS=10
MODEL_NAME_OR_PATH=xlnet-base-cased
MAX_LENGTH=512
GRAD_ACCU_STEPS=2
PER_DEVICE_BATCH_SIZE=2
SEED=11
LOG_STEP=1
SAVE_STEP=100
LR=5e-5
WARM_UP_STEP=10
RUN_NAME=v1_base_huggingface
mkdir -p ${MODEL_DIR}
/usr/local/bin/python ${XLA_SPAWN} --num_cores 8 ${RUN_PLM} --preprocessing_num_workers ${N_CPU} --run_name ${RUN_NAME} --dataloader_num_workers ${N_CPU} --model_name_or_path ${MODEL_NAME_OR_PATH} --max_seq_length ${MAX_LENGTH} --per_device_train_batch_size ${PER_DEVICE_BATCH_SIZE} --per_device_eval_batch_size ${PER_DEVICE_BATCH_SIZE} --gradient_accumulation_steps ${GRAD_ACCU_STEPS} --num_train_epochs ${NUM_EPOCHS} --train_file ${TRAIN_FILE} --validation_file ${TEST_FILE} --do_train --do_eval --output_dir ${MODEL_DIR} --overwrite_output_dir --save_steps ${SAVE_STEP} --seed ${SEED} --logging_first_step --logging_steps $LOG_STEP --learning_rate ${LR} --warmup_steps ${WARM_UP_STEP} --pad_to_max_length
! sh run_xla_train.sh
WARNING:root:Waiting for TPU to be start up with version pytorch-1.7...
WARNING:root:Waiting for TPU to be start up with version pytorch-1.7...
WARNING:root:TPU has started up successfully with version pytorch-1.7
2021-03-03 18:39:42.179972: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:root:TPU has started up successfully with version pytorch-1.7
2021-03-03 18:40:02.876519: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-03-03 18:40:03.026307: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-03-03 18:40:03.027292: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-03-03 18:40:03.034685: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-03-03 18:40:03.169623: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-03-03 18:40:03.240399: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-03-03 18:40:03.311547: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-03-03 18:40:03.416882: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:run_plm:Process rank: -1, device: xla:1, n_gpu: 0distributed training: False, 16-bits training: False
03/03/2021 18:40:28 - WARNING - run_plm - Process rank: -1, device: xla:0, n_gpu: 0distributed training: False, 16-bits training: False
WARNING:datasets.builder:Using custom data configuration default-0d49a6dc5a2016e7
03/03/2021 18:40:28 - WARNING - datasets.builder - Using custom data configuration default-0d49a6dc5a2016e7
Downloading and preparing dataset text/default (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /root/.cache/huggingface/datasets/text/default-0d49a6dc5a2016e7/0.0.0/293ecb642f9fca45b44ad1f90c8445c54b9d80b95ab3fca3cfa5e1e3d85d4a57...
Dataset text downloaded and prepared to /root/.cache/huggingface/datasets/text/default-0d49a6dc5a2016e7/0.0.0/293ecb642f9fca45b44ad1f90c8445c54b9d80b95ab3fca3cfa5e1e3d85d4a57. Subsequent calls will reuse this data.
03/03/2021 18:40:29 - WARNING - datasets.builder - Reusing dataset text (/root/.cache/huggingface/datasets/text/default-0d49a6dc5a2016e7/0.0.0/293ecb642f9fca45b44ad1f90c8445c54b9d80b95ab3fca3cfa5e1e3d85d4a57)
03/03/2021 18:40:29 - WARNING - run_plm - Process rank: -1, device: xla:0, n_gpu: 0distributed training: False, 16-bits training: False
03/03/2021 18:40:29 - WARNING - datasets.builder - Using custom data configuration default-0d49a6dc5a2016e7
03/03/2021 18:40:29 - WARNING - datasets.builder - Reusing dataset text (/root/.cache/huggingface/datasets/text/default-0d49a6dc5a2016e7/0.0.0/293ecb642f9fca45b44ad1f90c8445c54b9d80b95ab3fca3cfa5e1e3d85d4a57)
03/03/2021 18:40:29 - WARNING - run_plm - Process rank: -1, device: xla:0, n_gpu: 0distributed training: False, 16-bits training: False
Downloading: 100% 760/760 [00:00<00:00, 236kB/s]
[INFO|configuration_utils.py:449] 2021-03-03 18:40:29,949 >> loading configuration file https://huggingface.co/xlnet-base-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/06bdb0f5882dbb833618c81c3b4c996a0c79422fa2c95ffea3827f92fc2dba6b.da982e2e596ec73828dbae86525a1870e513bd63aae5a2dc773ccc840ac5c346
[INFO|configuration_utils.py:485] 2021-03-03 18:40:29,951 >> Model config XLNetConfig {
"architectures": [
"XLNetLMHeadModel"
],
"attn_type": "bi",
"bi_data": false,
"bos_token_id": 1,
"clamp_len": -1,
"d_head": 64,
"d_inner": 3072,
"d_model": 768,
"dropout": 0.1,
"end_n_top": 5,
"eos_token_id": 2,
"ff_activation": "gelu",
"initializer_range": 0.02,
"layer_norm_eps": 1e-12,
"mem_len": null,
"model_type": "xlnet",
"n_head": 12,
"n_layer": 12,
"pad_token_id": 5,
"reuse_len": null,
"same_length": false,
"start_n_top": 5,
"summary_activation": "tanh",
"summary_last_dropout": 0.1,
"summary_type": "last",
"summary_use_proj": true,
"task_specific_params": {
"text-generation": {
"do_sample": true,
"max_length": 250
}
},
"transformers_version": "4.3.3",
"untie_r": true,
"use_mems_eval": true,
"use_mems_train": false,
"vocab_size": 32000
}
03/03/2021 18:40:30 - WARNING - run_plm - Process rank: -1, device: xla:0, n_gpu: 0distributed training: False, 16-bits training: False
03/03/2021 18:40:30 - WARNING - datasets.builder - Using custom data configuration default-0d49a6dc5a2016e7
03/03/2021 18:40:30 - WARNING - datasets.builder - Reusing dataset text (/root/.cache/huggingface/datasets/text/default-0d49a6dc5a2016e7/0.0.0/293ecb642f9fca45b44ad1f90c8445c54b9d80b95ab3fca3cfa5e1e3d85d4a57)
[INFO|configuration_utils.py:449] 2021-03-03 18:40:30,061 >> loading configuration file https://huggingface.co/xlnet-base-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/06bdb0f5882dbb833618c81c3b4c996a0c79422fa2c95ffea3827f92fc2dba6b.da982e2e596ec73828dbae86525a1870e513bd63aae5a2dc773ccc840ac5c346
[INFO|configuration_utils.py:485] 2021-03-03 18:40:30,062 >> Model config XLNetConfig {
"architectures": [
"XLNetLMHeadModel"
],
"attn_type": "bi",
"bi_data": false,
"bos_token_id": 1,
"clamp_len": -1,
"d_head": 64,
"d_inner": 3072,
"d_model": 768,
"dropout": 0.1,
"end_n_top": 5,
"eos_token_id": 2,
"ff_activation": "gelu",
"initializer_range": 0.02,
"layer_norm_eps": 1e-12,
"mem_len": null,
"model_type": "xlnet",
"n_head": 12,
"n_layer": 12,
"pad_token_id": 5,
"reuse_len": null,
"same_length": false,
"start_n_top": 5,
"summary_activation": "tanh",
"summary_last_dropout": 0.1,
"summary_type": "last",
"summary_use_proj": true,
"task_specific_params": {
"text-generation": {
"do_sample": true,
"max_length": 250
}
},
"transformers_version": "4.3.3",
"untie_r": true,
"use_mems_eval": true,
"use_mems_train": false,
"vocab_size": 32000
}
03/03/2021 18:40:30 - WARNING - run_plm - Process rank: -1, device: xla:0, n_gpu: 0distributed training: False, 16-bits training: False
03/03/2021 18:40:30 - WARNING - datasets.builder - Using custom data configuration default-0d49a6dc5a2016e7
03/03/2021 18:40:30 - WARNING - datasets.builder - Reusing dataset text (/root/.cache/huggingface/datasets/text/default-0d49a6dc5a2016e7/0.0.0/293ecb642f9fca45b44ad1f90c8445c54b9d80b95ab3fca3cfa5e1e3d85d4a57)
Downloading: 0% 0.00/798k [00:00<?, ?B/s]03/03/2021 18:40:30 - WARNING - datasets.builder - Using custom data configuration default-0d49a6dc5a2016e7
03/03/2021 18:40:30 - WARNING - datasets.builder - Reusing dataset text (/root/.cache/huggingface/datasets/text/default-0d49a6dc5a2016e7/0.0.0/293ecb642f9fca45b44ad1f90c8445c54b9d80b95ab3fca3cfa5e1e3d85d4a57)
Downloading: 100% 798k/798k [00:00<00:00, 5.38MB/s]
03/03/2021 18:40:30 - WARNING - run_plm - Process rank: -1, device: xla:0, n_gpu: 0distributed training: False, 16-bits training: False
Downloading: 0% 0.00/1.38M [00:00<?, ?B/s]03/03/2021 18:40:30 - WARNING - datasets.builder - Using custom data configuration default-0d49a6dc5a2016e7
03/03/2021 18:40:30 - WARNING - datasets.builder - Reusing dataset text (/root/.cache/huggingface/datasets/text/default-0d49a6dc5a2016e7/0.0.0/293ecb642f9fca45b44ad1f90c8445c54b9d80b95ab3fca3cfa5e1e3d85d4a57)
Downloading: 100% 1.38M/1.38M [00:00<00:00, 7.47MB/s]
[INFO|tokenization_utils_base.py:1786] 2021-03-03 18:40:30,898 >> loading file https://huggingface.co/xlnet-base-cased/resolve/main/spiece.model from cache at /root/.cache/huggingface/transformers/df73bc9f8d13bf2ea4dab95624895e45a550a0f0a825e41fc25440bf367ee3c8.d93497120e3a865e2970f26abdf7bf375896f97fde8b874b70909592a6c785c9
[INFO|tokenization_utils_base.py:1786] 2021-03-03 18:40:30,898 >> loading file https://huggingface.co/xlnet-base-cased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/46f47734f3dcaef7e236b9a3e887f27814e18836a8db7e6a49148000058a1a54.2a683f915238b4f560dab0c724066cf0a7de9a851e96b0fb3a1e7f0881552f53
03/03/2021 18:40:31 - WARNING - run_plm - Process rank: -1, device: xla:0, n_gpu: 0distributed training: False, 16-bits training: False
[INFO|file_utils.py:1302] 2021-03-03 18:40:31,276 >> https://huggingface.co/xlnet-base-cased/resolve/main/pytorch_model.bin not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmp4xu840rv
03/03/2021 18:40:31 - WARNING - datasets.builder - Using custom data configuration default-0d49a6dc5a2016e7
03/03/2021 18:40:31 - WARNING - datasets.builder - Reusing dataset text (/root/.cache/huggingface/datasets/text/default-0d49a6dc5a2016e7/0.0.0/293ecb642f9fca45b44ad1f90c8445c54b9d80b95ab3fca3cfa5e1e3d85d4a57)
Downloading: 100% 467M/467M [00:09<00:00, 51.4MB/s]
[INFO|file_utils.py:1306] 2021-03-03 18:40:40,731 >> storing https://huggingface.co/xlnet-base-cased/resolve/main/pytorch_model.bin in cache at /root/.cache/huggingface/transformers/9461853998373b0b2f8ef8011a13b62a2c5f540b2c535ef3ea46ed8a062b16a9.3e214f11a50e9e03eb47535b58522fc3cc11ac67c120a9450f6276de151af987
[INFO|file_utils.py:1309] 2021-03-03 18:40:40,731 >> creating metadata file for /root/.cache/huggingface/transformers/9461853998373b0b2f8ef8011a13b62a2c5f540b2c535ef3ea46ed8a062b16a9.3e214f11a50e9e03eb47535b58522fc3cc11ac67c120a9450f6276de151af987
[INFO|modeling_utils.py:1027] 2021-03-03 18:40:40,732 >> loading weights file https://huggingface.co/xlnet-base-cased/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/9461853998373b0b2f8ef8011a13b62a2c5f540b2c535ef3ea46ed8a062b16a9.3e214f11a50e9e03eb47535b58522fc3cc11ac67c120a9450f6276de151af987
0% 0/9 [00:00<?, ?ba/s][INFO|modeling_utils.py:1143] 2021-03-03 18:41:15,062 >> All model checkpoint weights were used when initializing XLNetLMHeadModel.
[INFO|modeling_utils.py:1152] 2021-03-03 18:41:15,070 >> All the weights of XLNetLMHeadModel were initialized from the model checkpoint at xlnet-base-cased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use XLNetLMHeadModel for predictions without further training.
100% 9/9 [00:44<00:00, 4.97s/ba]
100% 9/9 [00:44<00:00, 4.97s/ba]
100% 9/9 [00:44<00:00, 4.96s/ba]
100% 9/9 [00:44<00:00, 4.95s/ba]
100% 9/9 [00:45<00:00, 5.03s/ba]
100% 9/9 [00:44<00:00, 4.99s/ba]
100% 9/9 [00:45<00:00, 5.00s/ba]
100% 9/9 [00:45<00:00, 5.02s/ba]
100% 9/9 [00:44<00:00, 4.92s/ba]
100% 9/9 [00:44<00:00, 4.95s/ba]
100% 9/9 [00:44<00:00, 4.93s/ba]
100% 9/9 [00:44<00:00, 4.97s/ba]
100% 9/9 [00:44<00:00, 4.95s/ba]
100% 9/9 [00:44<00:00, 4.99s/ba]
100% 9/9 [00:44<00:00, 4.99s/ba]
100% 9/9 [00:45<00:00, 5.00s/ba]
100% 9/9 [02:45<00:00, 18.43s/ba]
100% 9/9 [02:53<00:00, 19.30s/ba]
100% 9/9 [02:56<00:00, 19.63s/ba]
100% 9/9 [03:04<00:00, 20.47s/ba]
100% 9/9 [03:10<00:00, 21.18s/ba]
100% 9/9 [03:10<00:00, 21.15s/ba]
100% 9/9 [03:12<00:00, 21.34s/ba]
100% 9/9 [03:54<00:00, 26.02s/ba]
100% 9/9 [01:39<00:00, 11.11s/ba]
78% 7/9 [02:12<00:38, 19.32s/ba]/usr/local/lib/python3.7/dist-packages/transformers/trainer.py:705: FutureWarning: `model_path` is deprecated and will be removed in a future version. Use `resume_from_checkpoint` instead.
FutureWarning,
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
78% 7/9 [01:56<00:35, 17.85s/ba]WARNING:root:TPU has started up successfully with version pytorch-1.7
100% 9/9 [02:19<00:00, 15.46s/ba]
100% 9/9 [02:12<00:00, 14.73s/ba]
100% 9/9 [02:31<00:00, 16.82s/ba]
2021-03-03 18:48:05.195537: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
100% 9/9 [02:30<00:00, 16.69s/ba]
44% 4/9 [01:33<02:00, 24.18s/ba]/usr/local/lib/python3.7/dist-packages/transformers/trainer.py:705: FutureWarning: `model_path` is deprecated and will be removed in a future version. Use `resume_from_checkpoint` instead.
FutureWarning,
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
100% 9/9 [02:22<00:00, 15.87s/ba]
[INFO|trainer.py:432] 2021-03-03 18:48:19,351 >> The following columns in the training set don't have a corresponding argument in `XLNetLMHeadModel.forward` and have been ignored: .
[INFO|trainer.py:432] 2021-03-03 18:48:19,356 >> The following columns in the evaluation set don't have a corresponding argument in `XLNetLMHeadModel.forward` and have been ignored: .
/usr/local/lib/python3.7/dist-packages/transformers/trainer.py:705: FutureWarning: `model_path` is deprecated and will be removed in a future version. Use `resume_from_checkpoint` instead.
FutureWarning,
WARNING:root:TPU has started up successfully with version pytorch-1.7
[INFO|trainer.py:837] 2021-03-03 18:48:19,404 >> ***** Running training *****
[INFO|trainer.py:838] 2021-03-03 18:48:19,404 >> Num examples = 2918
[INFO|trainer.py:839] 2021-03-03 18:48:19,404 >> Num Epochs = 10
[INFO|trainer.py:840] 2021-03-03 18:48:19,404 >> Instantaneous batch size per device = 2
[INFO|trainer.py:841] 2021-03-03 18:48:19,404 >> Total train batch size (w. parallel, distributed & accumulation) = 32
[INFO|trainer.py:842] 2021-03-03 18:48:19,405 >> Gradient Accumulation steps = 2
[INFO|trainer.py:843] 2021-03-03 18:48:19,405 >> Total optimization steps = 910
0% 0/910 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
/usr/local/lib/python3.7/dist-packages/transformers/trainer.py:705: FutureWarning: `model_path` is deprecated and will be removed in a future version. Use `resume_from_checkpoint` instead.
FutureWarning,
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
89% 8/9 [02:27<00:17, 17.47s/ba]/usr/local/lib/python3.7/dist-packages/transformers/trainer.py:705: FutureWarning: `model_path` is deprecated and will be removed in a future version. Use `resume_from_checkpoint` instead.
FutureWarning,
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
100% 9/9 [02:28<00:00, 16.52s/ba]
56% 5/9 [01:46<01:23, 20.76s/ba]WARNING:root:TPU has started up successfully with version pytorch-1.7
/usr/local/lib/python3.7/dist-packages/transformers/trainer.py:705: FutureWarning: `model_path` is deprecated and will be removed in a future version. Use `resume_from_checkpoint` instead.
FutureWarning,
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
2021-03-03 18:48:32.196680: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:root:TPU has started up successfully with version pytorch-1.7
/usr/local/lib/python3.7/dist-packages/transformers/trainer.py:705: FutureWarning: `model_path` is deprecated and will be removed in a future version. Use `resume_from_checkpoint` instead.
FutureWarning,
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
67% 6/9 [01:55<00:51, 17.31s/ba]2021-03-03 18:48:35.876819: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-03-03 18:48:36.002518: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-03-03 18:48:41.180115: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
78% 7/9 [02:07<00:31, 15.52s/ba]2021-03-03 18:48:47.832125: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-03-03 18:48:51.242607: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
100% 9/9 [02:14<00:00, 14.99s/ba]
/usr/local/lib/python3.7/dist-packages/transformers/trainer.py:705: FutureWarning: `model_path` is deprecated and will be removed in a future version. Use `resume_from_checkpoint` instead.
FutureWarning,
0% 1/910 [00:43<10:58:35, 43.47s/it]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
WARNING:root:TPU has started up successfully with version pytorch-1.7
2021-03-03 18:49:06.879620: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
{'loss': 5.5622, 'learning_rate': 5e-06, 'epoch': 0.01}
{'loss': 5.9572, 'learning_rate': 1e-05, 'epoch': 0.02}
{'loss': 4.9327, 'learning_rate': 1.5e-05, 'epoch': 0.03}
{'loss': 4.0891, 'learning_rate': 2e-05, 'epoch': 0.04}
{'loss': 3.6777, 'learning_rate': 2.5e-05, 'epoch': 0.05}
{'loss': 4.2414, 'learning_rate': 3e-05, 'epoch': 0.07}
{'loss': 4.0288, 'learning_rate': 3.5e-05, 'epoch': 0.08}
{'loss': 3.2993, 'learning_rate': 4e-05, 'epoch': 0.09}
{'loss': 3.4531, 'learning_rate': 4.5e-05, 'epoch': 0.1}
{'loss': 3.5838, 'learning_rate': 5e-05, 'epoch': 0.11}
{'loss': 3.2614, 'learning_rate': 4.994444444444445e-05, 'epoch': 0.12}
{'loss': 2.8368, 'learning_rate': 4.9888888888888894e-05, 'epoch': 0.13}
{'loss': 3.3508, 'learning_rate': 4.9833333333333336e-05, 'epoch': 0.14}
{'loss': 3.6209, 'learning_rate': 4.977777777777778e-05, 'epoch': 0.15}
2% 14/910 [06:28<1:27:10, 5.84s/it]Traceback (most recent call last):
File "/content/drive/MyDrive/colab_data/pretrained_models/bioXLNet/transformers-4.2.2/examples/xla_spawn.py", line 85, in <module>
main()
File "/content/drive/MyDrive/colab_data/pretrained_models/bioXLNet/transformers-4.2.2/examples/xla_spawn.py", line 81, in main
xmp.spawn(mod._mp_fn, args=(), nprocs=args.num_cores)
File "/usr/local/lib/python3.7/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 395, in spawn
start_method=start_method)
File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
while not context.join():
File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 77, in join
timeout=timeout,
File "/usr/lib/python3.7/multiprocessing/connection.py", line 921, in wait
ready = selector.select(timeout)
File "/usr/lib/python3.7/selectors.py", line 415, in select
fd_event_list = self._selector.poll(timeout)
KeyboardInterrupt
2% 14/910 [06:29<6:55:47, 27.84s/it]
^C